NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Comprehensive assessment of AlphaFold's predictions of secondary structure and solvent accessibility at the amino acid-level in eukaryotic, bacterial and archaeal proteins

Yu, Jing; Zhao, Bi; Kurgan, Lukasz (May 2025, Computational and Structural Biotechnology Journal)

Numerous sequence-based predictors of the amino acid (AA)-level solvent accessibility (SA) and secondary structure (SS) of proteins have been developed. We empirically investigated whether these two key characteristics of AA-level structure can be accurately predicted from putative structures generated by the popular AlphaFold2. We compared AlphaFold2's results against several representative SS and SA predictors on a large test dataset that covers five distinct taxonomic groups (animals, plants, fungi, bacteria, and archaea). We used a broad collection of metrics that evaluate predictions of the numeric and binary (buried vs. solvent exposed) SA and the 3-state SS at both AA- and SS-region levels. We found that AlphaFold2 generated very accurate results, with high average Q3 accuracy of 0.928 for the SS prediction and high Pearson Correlation Coefficient (PCC) of 0.815 between its putative and native SA values. AlphaFold2 significantly and consistently outperforms the considered predictors of SA and SS across the five taxonomic groups and both AA and region level evaluations. Moreover, we demonstrated that AlphaFold2 nearly perfectly reconstructs distributions of the sizes and numbers of the SS regions. We also showed that AlphaFold2 substantially improves over the SS and SA predictors when tested on a low sequence similarity test dataset, although its results and results of two other predictors suffer a modest drop in the quality of predicting SS regions. Altogether, our results suggest that AlphaFold2 makes very accurate predictions of SS and SA, which can be easily extracted from 200+ million pre-computed AF2's structure predictions in AlphaFoldDB.
more » « less
Free, publicly-accessible full text available May 29, 2026
Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins

https://doi.org/10.1038/s41596-023-00876-x

Kurgan, Lukasz; Hu, Gang; Wang, Kui; Ghadermarzi, Sina; Zhao, Bi; Malhis, Nawar; Erdős, Gábor; Gsponer, Jörg; Uversky, Vladimir N.; Dosztányi, Zsuzsanna (November 2023, Nature Protocols)

Full Text Available
Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins

https://doi.org/10.1016/j.csbj.2023.06.001

Zhao, Bi; Ghadermarzi, Sina; Kurgan, Lukasz (January 2023, Computational and Structural Biotechnology Journal)

Full Text Available
Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions

https://doi.org/10.3390/biom12070888

Zhao, Bi; Kurgan, Lukasz (July 2022, Biomolecules)

Intrinsically disordered regions (IDRs) carry out many cellular functions and vary in length and placement in protein sequences. This diversity leads to variations in the underlying compositional biases, which were demonstrated for the short vs. long IDRs. We analyze compositional biases across four classes of disorder: fully disordered proteins; short IDRs; long IDRs; and binding IDRs. We identify three distinct biases: for the fully disordered proteins, the short IDRs and the long and binding IDRs combined. We also investigate compositional bias for putative disorder produced by leading disorder predictors and find that it is similar to the bias of the native disorder. Interestingly, the accuracy of disorder predictions across different methods is correlated with the correctness of the compositional bias of their predictions highlighting the importance of the compositional bias. The predictive quality is relatively low for the disorder classes with compositional bias that is the most different from the “generic” disorder bias, while being much higher for the classes with the most similar bias. We discover that different predictors perform best across different classes of disorder. This suggests that no single predictor is universally best and motivates the development of new architectures that combine models that target specific disorder classes.
more » « less
Full Text Available
CLIP: accurate prediction of disordered linear interacting peptides from protein sequences using co-evolutionary information

https://doi.org/10.1093/bib/bbac502

Peng, Zhenling; Li, Zixia; Meng, Qiaozhen; Zhao, Bi; Kurgan, Lukasz (December 2022, Briefings in Bioinformatics)

Abstract One of key features of intrinsically disordered regions (IDRs) is facilitation of protein–protein and protein–nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
more » « less
Deep learning in prediction of intrinsic disorder in proteins

https://doi.org/10.1016/j.csbj.2022.03.003

Zhao, Bi; Kurgan, Lukasz (January 2022, Computational and Structural Biotechnology Journal)

Full Text Available
Surveying over 100 predictors of intrinsic disorder in proteins

https://doi.org/10.1080/14789450.2021.2018304

Zhao, Bi; Kurgan, Lukasz (December 2021, Expert Review of Proteomics)

Full Text Available
DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options

https://doi.org/10.1093/nar/gkad985

Basu, Sushmita; Zhao, Bi; Biró, Bálint; Faraggi, Eshel; Gsponer, Jörg; Hu, Gang; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Söding, Johannes; et al (November 2023, Nucleic Acids Research)

Abstract The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
more » « less
Complementarity of the residue-level protein function and structure predictions in human proteins

https://doi.org/10.1016/j.csbj.2022.05.003

Biró, Bálint; Zhao, Bi; Kurgan, Lukasz (January 2022, Computational and Structural Biotechnology Journal)

Full Text Available
DescribePROT: database of amino acid-level protein structure and function predictions

https://doi.org/10.1093/nar/gkaa931

Zhao, Bi; Katuwawala, Akila; Oldfield, Christopher J; Dunker, A Keith; Faraggi, Eshel; Gsponer, Jörg; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Obradovic, Zoran; et al (October 2020, Nucleic Acids Research)
null (Ed.)
Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
more » « less
Full Text Available

« Prev Next »

Search for: All records